reprocess failed account events #6571
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
mozilla/sumo#2217
mozilla/sumo#2271
Notes
This work introduces a periodic cron task (runs every 4 hours) that gathers all unprocessed account events within the last 24 hours and queues each of them for re-processing. So apart from the Celery retry mechanism on exceptions -- which I reduced from 4 retries to 3 retries, and occurs over the course of about 15 seconds -- the processing of failed (unprocessed) account event tasks will be attempted 5-6 times over the course of the 24-hour period starting from the moment of their creation. Account events that remain in the unprocessed state after 24 hours will no longer be re-processed.
I've also added DMS Snitches for
stage
andprod
, as well as defined their URL's forsettings.DMS_REPROCESS_FAILED_ACCOUNT_EVENTS
in our GCP secrets.As part of this work, I made the following changes to each of the tasks within
kitsune.users.tasks
:@skip_if_read_only_mode
decorators. They were useless even before we removed the Celery workers from the failover clusters -- because they would still "steal" events -- but even more so now that the Celery workers in the failover cluster have been removed.@transaction.atomic
decorators, so that all of the DB changes for each task are handled as an atomic chunk. If any one of them fail, all of the others are rolled back. This is to prevent failures from leaving behind a residue of partially-completed work, so that retries always start with a clean slate.